Assuming either
though Windows is still doable; the parts about the PATH
variables may or may not be relevant.
Anaconda is an incredibly popular Python distribution platform. It's the product of Continuum Analytics, and is effectively the de facto way to interact with the Python ecosystem at this point.
(Translation: Use Anaconda for installing Python. DO NOT USE python.org unless you are super comfortable with micromanaging your Python install and the packages you download)
Installing Anaconda is easy enough: navigate to this link https://www.continuum.io/downloads
The website will auto-detect your operating system and present you with the relevant downloads (but of course you can pick another if you want).
Since I do pretty much everything by command line these days (and Jupyter notebooks provide a direct interface to the command line!), I'll go with the Linux version of the installer (but again, the Windows exe
and macOS pkg
will provide a GUI installer if you really want it).
In [1]:
# Step 1: right-click the "download" link on the left
# Step 2: select "copy link address"
# Step 3: paste the link into the following bash command, after "wget"
!wget https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh
The download takes awhile; it's a big distribution!
Once it's downloaded, all you have to provide is an install prefix, i.e. where you want this Python install to live on your computer.
/opt/conda
, but that's a global configuration./home/<your_username>/conda
or /home/<your_username>/python
or whatever you'd like!
In [2]:
!ls # This will show us the files in our current directory
In [3]:
# This is an easy one-liner, but it's absolutely necessary: it makes the file *executable*
!chmod +x Anaconda3-4.4.0-Linux-x86_64.sh
In [4]:
# The "-b" flag means "batch", which means the install won't stop to ask us pesky questions
# The "-p" flag expects a path where Python will be installed. I've provided a local one
!bash Anaconda3-4.4.0-Linux-x86_64.sh -b -p ./conda-install
This part will take awhile, depending largely on your internet connection. Go grab some coffee!
There's one more small step to be done, and here's where things might get confusing if you've little or no experience installing things by command line.
You've installed Python locally, and that's super-cool--uninstall is quite literally as easy as trashing the directory you put after the -p
flag--but the problem is, your computer doesn't know where it is.
In [5]:
!which python
This is a great command to test exactly what version of python
is being executed. In this case, it's a version I'd installed previous, at /opt/python
. That is definitely not the version I just installed. How do I alert my operating system that, when I type a command starting with python
, I want it to refer to the local one I just installed?
You have two options: both involve editing the PATH
environment variable that is more or less singularly responsible for telling your computer where everything is (if you've ever accidentally nuked this variable, you know how important it is). The difference is how your edits persist.
This makes the changes stick, even after you close down your command prompt. Even after you reboot!
export PATH=/home/<your_username>/conda-install/bin:$PATH
Basically, the FULL path of whatever directory you provided to the Anaconda installer, with /bin
tacked on the end.
Also, don't forget the colon and the $PATH
suffix! Doing so will achieve said nuking of the variable!
This is essentially: changing the variable for the duration of time you have your command prompt window open. Once you close it down, your computer reverts to whatever configuration it was before. That can be achieved simply by running the above command right in your command prompt window:
export PATH=/home/<your_username>/conda-install/bin:$PATH
I'll do that one here, using the Terminal interface provided by Jupyter notebooks:
Once you type exit
on that command prompt, those changes will go away and the PATH
variable will revert to whatever it was before, obviously devoid of the changes you made with respect to the local Python install. So if you want the changes to stick, go with Option 1.
If you use the exe
or pkg
installers provided by the Windows or macOS versions respectively, they do this exact thing under the hood!
One of the most powerful aspects of Anaconda is the concept of the environment.
Have you heard of virtualenv
? It's a Python package that allows to create "virtual environments" that are completely disconnected from each other. Reasons you might want to have parallel Python environments existing completely separately from one another:
The conda
tool, first and foremost, is an environment manager. In fact, when you installed Anaconda, it created an initial "default" environment: using this blueprint as a jumping-off point, you could construct lots of parallel environments using different versions of Python, or the packages, or just different package combinations.
Biggest example: Python 2 versus Python 3.
In [4]:
!conda env list
As you can see, I already have four Python environments up and running for several different versions of Python.
This is certainly the easiest way of creating a new environment: just give conda a name and a list of core packages, and you're good to go!
In [6]:
!conda create -n myenv -y python=2.7 scikit-learn numpy scipy matplotlib
Ta-daa!
Let's break it down:
conda create
This is the part of the command that specifies that we want a new environment, separate from the default (or "root").
-n myenv
This part provides the name of the environment. It can be pretty much anything other than "root", aside from any other environments you've named. Try to make the name descriptive of its purpose! (HINT: perhaps name your environments by the project you're working on?...)
python=2.7 scikit-learn numpy scipy matplotlib
This is the list of packages I want my new environment to come with by default: Python (version 2.7), scikit-learn, NumPy, SciPy, and Matplotlib.
After that, conda goes to work! It even provides the instructions at the end for switching into and out of the environment:
source activate myenv
drops you into the environment, and once inside it,
source deactivate
drops you out and back into the default, or root, environment.
In [7]:
!conda env list
If you love an environment and have it set up perfectly and want to introduce an unstable element, you can also clone that environment into a new one.
It works the same as before, but in lieu of a package list (which you can still provide!), the main argument is the --clone
option:
In [8]:
!conda create -n myenv2 --clone myenv
In [9]:
!conda env list
There you go! Two identical but completely distinct environments.
You've gone through all this trouble to set up and configure your environment. Over the days/weeks/months, you've installed and removed certain packages. You have the perfect environment.
Now, your friend wants the same environment. We've seen how easy it is to clone existing environments on the same machine, but what about sending the environment schematics elsewhere?
As an example, I maintain a list of Python packages that I use to rebuild my default environment from scratch in the case of catastrophic failure.
In this case, you can export your environments into the super-friendly YAML format:
In [1]:
!conda env export -n myenv2 -f myenv2.yaml
This creates a file myenv2.yaml
(which you can name whatever you want) from the "myenv2" environment. Here are the contents of the file:
In [ ]:
# %load myenv2.yaml
name: myenv2
channels:
- menpo
- conda-forge
- defaults
dependencies:
- backports_abc=0.5=py27_0
- blas=1.1=openblas
- ca-certificates=2017.7.27.1=0
- certifi=2017.7.27.1=py27_0
- cycler=0.10.0=py27_0
- dbus=1.10.10=3
- expat=2.2.1=0
- fontconfig=2.12.1=4
- freetype=2.7=1
- functools32=3.2.3.2=py27_1
- gettext=0.19.7=1
- glib=2.51.4=0
- gst-plugins-base=1.8.0=0
- gstreamer=1.8.0=2
- icu=58.1=1
- jpeg=9b=0
- libffi=3.2.1=3
- libiconv=1.14=4
- libpng=1.6.28=0
- libxcb=1.12=1
- libxml2=2.9.4=4
- matplotlib=2.0.2=py27_2
- ncurses=5.9=10
- numpy=1.13.1=py27_blas_openblas_200
- openblas=0.2.19=2
- openssl=1.0.2l=0
- pcre=8.39=0
- pip=9.0.1=py27_0
- pyparsing=2.2.0=py27_0
- pyqt=5.6.0=py27_4
- python=2.7.13=1
- python-dateutil=2.6.1=py27_0
- pytz=2017.2=py27_0
- qt=5.6.2=3
- readline=6.2=0
- scikit-learn=0.19.0=py27_blas_openblas_201
- scipy=0.19.1=py27_blas_openblas_202
- setuptools=36.2.2=py27_0
- singledispatch=3.4.0.3=py27_0
- sip=4.18=py27_1
- six=1.10.0=py27_1
- sqlite=3.13.0=1
- ssl_match_hostname=3.5.0.1=py27_1
- subprocess32=3.2.7=py27_0
- tk=8.5.19=2
- tornado=4.5.1=py27_0
- wheel=0.29.0=py27_0
- xorg-libxau=1.0.8=3
- xorg-libxdmcp=1.1.2=3
- xz=5.2.2=0
- zlib=1.2.11=0
- libgfortran=3.0.0=1
- pip:
- backports-abc==0.5
- backports.ssl-match-hostname==3.5.0.1
prefix: /opt/python/envs/myenv2
It's incredibly detailed, down to the exact versions of each package used in the environment. Short of the operating system itself, you can use this to duplicate environments exactly on different computers.
Then, of course, there's the opposite operation: creating an environment from a YAML file.
NOTE: You'll notice there's a name
field in the YAML file. Since we're performing both the export and import on the same machine, there could be a conflict in the name of the environment. Normally, this wouldn't be the case, but here we'll have to fix that one of two ways.
1: Just change the name of the environment in the YAML file to something else.
2: Use the -n
flag to specify a name when you create the environment from the file.
In [3]:
!conda env create -n myenv3 -f myenv2.yaml
In [4]:
!conda env list
Just look at how they multiply!
Note that conda will also determine if any other packages (dependencies) need to be installed in order to install the package you requested. Once you've agreed, just hit "y" and the install will proceed.
conda remove
The opposite of the previous command, this will remove the specified packages from your system (note: it won't outright delete them).
Also note: it won't delete any prerequisites it had to install.
Ok, the last real wrinkle in a conda environment, and where (in my view) conda really gets it power: channels.
The philosophy behind conda--an open source package manager not just for Python, but a language-agnostic package manager that worked irrespective of platform, providing fully-built packages for any operating system--was a good one.
Problem is, there are too many platforms out there to fully satisfy them all. Furthermore, with Continuum being the ones to have to rebuild the packages and dependency lists any time an updated package under the Anaconda purview was released, it got to be too much work.
So they developed the idea of channels: distinct build avenues that even the common user can hook into. You can build a package from scratch with highly customized build options, then put that package into a custom channel.
This strategy has reached fever pitch with the conda-forge project.
The goal of this project is to essentially outsource the building of all Python packages to the community, and make them available through the "conda-forge" channel.
You may have noticed several instances throughout this workshop where the output of certain conda commands contained "conda-forge". I am indeed subscribed to this channel, and I highly recommend you do as well. This will give you access to the most recent build recipes, which are often out very shortly after the new version of any given Python package is released; much quicker turnaround than Anaconda's official default channel.
To subscribe to a channel, you can use the "conda config" command:
Or, in this case:
conda config --add channels conda-forge
Alternatively, you can install packages directly from channels without needing to be subscribed. You can do this through the "-c" flag in the "conda install" command:
conda install -c conda-forge scipy
This command will install the latest version of SciPy, but specifically from the conda-forge channel.
If you don't specify a channel, conda will automatically select the best matching package from a channel you're subscribed to. This is not necessarily the default Anaconda channel! You can actually specify a hierarchy in the channels you're subscribed to--for instance, conda-forge is my highest-level channel, so whenever I do not specify a channel during install, it defaults to conda-forge first before any others.
You can find out this hierarchy from the command:
conda config --show-sources
The last command to know--and to use frequently--is conda update
. This will update the specified packages to their latest versions available through the corresponding channel.
You can specify a package, e.g.
conda update scikit-learn
Or, you can do the "catch-all" to update absolutely everything:
conda update --all
Just be careful with that last one--I've had strange things happen on occasion...
There are lots of online resources to learn about the ins and outs of conda. I would specifically point you to the following:
Happy coding!